393 research outputs found

    Classification and Selection of Biomarkers in Genomic Data Using LASSO

    Get PDF
    High-throughput gene expression technologies such as microarrays have been utilized in a variety of scientific applications. Most of the work has been done on assessing univariate associations between gene expression profiles with clinical outcome (variable selection) or on developing classification procedures with gene expression data (supervised learning). We consider a hybrid variable selection/classification approach that is based on linear combinations of the gene expression profiles that maximize an accuracy measure summarized using the receiver operating characteristic curve. Under a specific probability model, this leads to the consideration of linear discriminant functions. We incorporate an automated variable selection approach using LASSO. An equivalence between LASSO estimation with support vector machines allows for model fitting using standard software. We apply the proposed method to simulated data as well as data from a recently published prostate cancer study

    Empirical Bayes Identication of Tumor Progression Genes from Microarray Data

    Full text link
    The use of microarray data has become quite commonplace in medical and scientific experiments. We focus here on microarray data generated from cancer studies. It is potentially important for the discovery of biomarkers to identify genes whose expression levels correlate with tumor progression. In this article, we propose a simple procedure for the identification of such genes, which we term tumor progression genes. The first stage involves estimation based on the proportional odds model. At the second stage, we calculate two quantities: a q -value, and a shrinkage estimator of the test statistic is constructed to adjust for the multiple testing problem. The relationship between the proposed method with the false discovery rate is studied. The proposed methods are applied to data from a prostate cancer microarray study. (© 2007 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim)Peer Reviewedhttp://deepblue.lib.umich.edu/bitstream/2027.42/55946/1/68_ftp.pd

    PubMed QUEST: The PubMed Query Search Tool. An informatics tool to aid cancer centers and cancer investigators in searching the PubMed databases

    Get PDF
    Searching PubMed for citations related to a specific cancer center or group of authors can be labor-intensive. We have created a tool, PubMed QUEST, to aid in the rapid searching of PubMed for publications of interest. It was designed by taking into account the needs of entire cancer centers as well as individual investigators. The experience of using the tool by our institution’s cancer center administration and investigators has been favorable and we believe it could easily be adapted to other institutions. Use of the tool has identified limitations of automated searches for publications based on an author’s name, especially for common names. These limitations could likely be solved if the PubMed database assigned a unique identifier to each author

    Prognostic meta-signature of breast cancer developed by two-stage mixture modeling of microarray data

    Get PDF
    BACKGROUND: An increasing number of studies have profiled tumor specimens using distinct microarray platforms and analysis techniques. With the accumulating amount of microarray data, one of the most intriguing yet challenging tasks is to develop robust statistical models to integrate the findings. RESULTS: By applying a two-stage Bayesian mixture modeling strategy, we were able to assimilate and analyze four independent microarray studies to derive an inter-study validated "meta-signature" associated with breast cancer prognosis. Combining multiple studies (n = 305 samples) on a common probability scale, we developed a 90-gene meta-signature, which strongly associated with survival in breast cancer patients. Given the set of independent studies using different microarray platforms which included spotted cDNAs, Affymetrix GeneChip, and inkjet oligonucleotides, the individually identified classifiers yielded gene sets predictive of survival in each study cohort. The study-specific gene signatures, however, had minimal overlap with each other, and performed poorly in pairwise cross-validation. The meta-signature, on the other hand, accommodated such heterogeneity and achieved comparable or better prognostic performance when compared with the individual signatures. Further by comparing to a global standardization method, the mixture model based data transformation demonstrated superior properties for data integration and provided solid basis for building classifiers at the second stage. Functional annotation revealed that genes involved in cell cycle and signal transduction activities were over-represented in the meta-signature. CONCLUSION: The mixture modeling approach unifies disparate gene expression data on a common probability scale allowing for robust, inter-study validated prognostic signatures to be obtained. With the emerging utility of microarrays for cancer prognosis, it will be important to establish paradigms to meta-analyze disparate gene expression data for prognostic signatures of potential clinical use

    Pathway analysis reveals functional convergence of gene expression profiles in breast cancer

    Get PDF
    Abstract Background A recent study has shown high concordance of several breast-cancer gene signatures in predicting disease recurrence despite minimal overlap of the gene lists. It raises the question if there are common themes underlying such prediction concordance that are not apparent on the individual gene-level. We therefore studied the similarity of these gene-signatures on the basis of their functional annotations. Results We found the signatures did not identify the same set of genes but converged on the activation of a similar set of oncogenic and clinically-relevant pathways. A clear and consistent pattern across the four breast cancer signatures is the activation of the estrogen-signaling pathway. Other common features include BRCA1-regulated pathway, reck pathways, and insulin signaling associated with the ER-positive disease signatures, all providing possible explanations for the prediction concordance. Conclusion This work explains why independent breast cancer signatures that appear to perform equally well at predicting patient prognosis show minimal overlap in gene membership.</p

    Gene Fusion Markup Language: a prototype for exchanging gene fusion data

    Full text link
    Abstract Background An avalanche of next generation sequencing (NGS) studies has generated an unprecedented amount of genomic structural variation data. These studies have also identified many novel gene fusion candidates with more detailed resolution than previously achieved. However, in the excitement and necessity of publishing the observations from this recently developed cutting-edge technology, no community standardization approach has arisen to organize and represent the data with the essential attributes in an interchangeable manner. As transcriptome studies have been widely used for gene fusion discoveries, the current non-standard mode of data representation could potentially impede data accessibility, critical analyses, and further discoveries in the near future. Results Here we propose a prototype, Gene Fusion Markup Language (GFML) as an initiative to provide a standard format for organizing and representing the significant features of gene fusion data. GFML will offer the advantage of representing the data in a machine-readable format to enable data exchange, automated analysis interpretation, and independent verification. As this database-independent exchange initiative evolves it will further facilitate the formation of related databases, repositories, and analysis tools. The GFML prototype is made available at http://code.google.com/p/gfml-prototype/ . Conclusion The Gene Fusion Markup Language (GFML) presented here could facilitate the development of a standard format for organizing, integrating and representing the significant features of gene fusion data in an inter-operable and query-able fashion that will enable biologically intuitive access to gene fusion findings and expedite functional characterization. A similar model is envisaged for other NGS data analyses.http://deepblue.lib.umich.edu/bitstream/2027.42/112901/1/12859_2011_Article_5754.pd

    Covariate adjustment in the analysis of microarray data from clinical studies

    Get PDF
    There is tremendous scientific interest in the analysis of gene expression data in clinical settings, such as oncology. In this paper, we describe the importance of adjusting for confounders and other prognostic factors in order to select for differentially expressed genes for follow-up validation studies. We develop two approaches to the analysis of microarray data in non-randomized clinical settings. The first is an extension of the current significance analysis of microarray procedures, where other covariates are taken into account. The second is a novel covariate-adjusted regression modelling based on the receiver operating characteristic (ROC) curve for the analysis of gene expression data. The ideas are illustrated using data from a prostate cancer molecular profiling study.Peer Reviewedhttp://deepblue.lib.umich.edu/bitstream/2027.42/47936/1/10142_2004_Article_120.pd

    FADD, a novel death domain-containing protein, interacts with the death domain of fas and initiates apoptosis

    Get PDF
    AbstractUsing the cytoplasmic domain of Fas in the yeast two-hybrid system, we have identified a novel interacting protein, FADD, which binds Fas and Fas-FD5, a mutant of Fas possessing enhanced killing activity, but not the functionally inactive mutants Fas-LPR and Fas-FD8. FADD contains a death domain homologous to the death domains of Fas and TNFR-1. A point mutation in FADD, analogous to the Ipr mutation of Fas, abolishes its ability to bind Fas, suggesting a death domain to death domain interaction. Overexpression of FADD in MCF7 and BJAB cells induces apoptosis, which, like Fas-induced apoptosis, is blocked by CrmA, a specific inhibitor of the interieukin-lβ-converting enzyme. These findings suggest that FADD may play an important role in the proximal signal transduction of Fas

    Landscape of gene fusions in epithelial cancers: seq and ye shall find

    Full text link
    Abstract Enabled by high-throughput sequencing approaches, epithelial cancers across a range of tissue types are seen to harbor gene fusions as integral to their landscape of somatic aberrations. Although many gene fusions are found at high frequency in several rare solid cancers, apart from fusions involving the ETS family of transcription factors which have been seen in approximately 50 % of prostate cancers, several other common solid cancers have been shown to harbor recurrent gene fusions at low frequencies. On the other hand, many gene fusions involving oncogenes, such as those encoding ALK, RAF or FGFR kinase families, have been detected across multiple different epithelial carcinomas. Tumor-specific gene fusions can serve as diagnostic biomarkers or help define molecular subtypes of tumors; for example, gene fusions involving oncogenes such as ERG, ETV1, TFE3, NUT, POU5F1, NFIB, PLAG1, and PAX8 are diagnostically useful. Tumors with fusions involving therapeutically targetable genes such as ALK, RET, BRAF, RAF1, FGFR1–4, and NOTCH1–3 have immediate implications for precision medicine across tissue types. Thus, ongoing cancer genomic and transcriptomic analyses for clinical sequencing need to delineate the landscape of gene fusions. Prioritization of potential oncogenic “drivers” from “passenger” fusions, and functional characterization of potentially actionable gene fusions across diverse tissue types, will help translate these findings into clinical applications. Here, we review recent advances in gene fusion discovery and the prospects for medicine.http://deepblue.lib.umich.edu/bitstream/2027.42/116210/1/13073_2015_Article_252.pd

    A Latent Variable Approach for Meta-Analysis of Gene Expression Data from Multiple Microarray Experiments

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>With the explosion in data generated using microarray technology by different investigators working on similar experiments, it is of interest to combine results across multiple studies.</p> <p>Results</p> <p>In this article, we describe a general probabilistic framework for combining high-throughput genomic data from several related microarray experiments using mixture models. A key feature of the model is the use of latent variables that represent quantities that can be combined across diverse platforms. We consider two methods for estimation of an index termed the probability of expression (POE). The first, reported in previous work by the authors, involves Markov Chain Monte Carlo (MCMC) techniques. The second method is a faster algorithm based on the expectation-maximization (EM) algorithm. The methods are illustrated with application to a meta-analysis of datasets for metastatic cancer.</p> <p>Conclusion</p> <p>The statistical methods described in the paper are available as an R package, metaArray 1.8.1, which is at Bioconductor, whose URL is <url>http://www.bioconductor.org/</url>.</p
    corecore